library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(bayesrules)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Label each as strongly or somewhat favoring \(\pi\), above or below:
The arguments that correspond to the graph shown is:
alpha = 3, beta = 8, y = 2, n = 4
I’ll check it:
plot_beta_binomial(alpha = 3,beta=8,y=2,n=4)
Yes! A Match!
A ginkgo tree sheds all of their leaves at the same time after the first frost. Randi thinks that the local ginkgo tree will drop all its leaves next Monday. She asks 5 friends what they think. Identify reasonable Beta priors for each belief.
This would convey a Beta that is pretty left skewed.
This uniform distribution feels appropriate because he has no idea and presumably no prior knowledge of tree behavior given how he hates trees.
Looks like Katie has some prior knowledge and put thought into it so I would think she is more certain of her answer, which is right skewed.
This is also right skewed but less certain than Katie.
This shows similar to Daryl but in reverse. That he is left skewed but less certain than Katie.
The local ice cream shop is open until it runs out of ice cream for the day. It’s 2pm and Chad wants to pick up an ice cream cone. He asks his coworkers about the chance \(\pi\) that the shop is still open. Their Beta priors are below:
Kimya - Beta(1,2)
Fernando - Beta(0.5,1)
Ciara - Beta(3,10)
Taylor - Beta(2,0.1)
Visualize and summarize in words each coworker’s prior understanding of Chad’s chances to satisfy his ice cream craving.
Kimya thinks its unlikely but their not too sure.
plot_beta(1,2)
Fernando thinks its not likely and their somewhat sure.
plot_beta(0.5,1)
Ciara thinks its not very likely and their pretty sure.
plot_beta(3,10)
Taylor thinks its pretty likely and their pretty sure.
plot_beta(2,0.1)
Chad finds that on 3 of the past 7 days they were still open at 2pm. For each of the coworkers:
Going to simulate the model by first simulating 10,000 values of each Beta prior and then use the binomial model:
Kimya
Beta(1,2), y = 3, n = 7
set.seed(369)
kimya_sim <- data.frame(pi = rbeta(10000,1,2)) |>
mutate(y=rbinom(10000,size=7,prob=pi))
kimya_posterior <- kimya_sim |>
filter(y==3)
ggplot(kimya_posterior,aes(x=pi)) +
geom_histogram(bins=20,color="white") +
theme_minimal() +
labs(title="Kimya Posterior")
kimya_posterior |>
summarize(mean(pi))
## mean(pi)
## 1 0.400331
Fernando
set.seed(369)
fernando_sim <- data.frame(pi = rbeta(10000,0.5,1)) |>
mutate(y=rbinom(10000,size=7,prob=pi))
fernando_posterior <- fernando_sim |>
filter(y==3)
ggplot(fernando_posterior,aes(x=pi)) +
geom_histogram(bins=20,color="white") +
theme_minimal() +
labs(title="Fernando Posterior")
fernando_posterior |>
summarize(mean(pi))
## mean(pi)
## 1 0.4227631
Ciara
set.seed(369)
ciara_sim <- data.frame(pi = rbeta(10000,3,10)) |>
mutate(y=rbinom(10000,size=7,prob=pi))
ciara_posterior <- ciara_sim |>
filter(y==3)
ggplot(ciara_posterior,aes(x=pi)) +
geom_histogram(bins=20,color="white") +
theme_minimal() +
labs(title="Ciara Posterior")
ciara_posterior |>
summarize(mean(pi))
## mean(pi)
## 1 0.3028835
Taylor
set.seed(369)
taylor_sim <- data.frame(pi = rbeta(10000,2,0.1)) |>
mutate(y=rbinom(10000,size=7,prob=pi))
taylor_posterior <- taylor_sim |>
filter(y==3)
ggplot(taylor_posterior,aes(x=pi)) +
geom_histogram(bins=20,color="white") +
theme_minimal() +
labs(title="Taylor Posterior")
taylor_posterior |>
summarize(mean(pi))
## mean(pi)
## 1 0.535582
For each coworker:
I’m going to use the summarize beta binomial to get the exact posterior model and the mean for each of these coworker’s priors:
Kimya
summarize_beta_binomial(alpha=1,beta=2,y=3,n=7)
## model alpha beta mean mode var sd
## 1 prior 1 2 0.3333333 0.000 0.05555556 0.2357023
## 2 posterior 4 6 0.4000000 0.375 0.02181818 0.1477098
plot_beta_binomial(alpha=1,beta=2,y=3,n=7)
The posterior model looks similar to the simulated model for Kimya. The simulated mean was 0.400331 and the exact mean was 0.40000. These are pretty close!
Fernando
summarize_beta_binomial(alpha=0.5,beta=1,y=3,n=7)
## model alpha beta mean mode var sd
## 1 prior 0.5 1 0.3333333 1.0000000 0.08888889 0.2981424
## 2 posterior 3.5 5 0.4117647 0.3846154 0.02549627 0.1596755
plot_beta_binomial(alpha=0.5,beta=1,y=3,n=7)
Again, the posterior model looks pretty similar to the simulated model. The mean in the simulated model was 0.4227621 and the exact mean was 0.4117647. This was a bit more different than Kimya’s mean from the simulated mean.
Ciara
summarize_beta_binomial(alpha=3,beta=10,y=3,n=7)
## model alpha beta mean mode var sd
## 1 prior 3 10 0.2307692 0.1818182 0.01267963 0.1126039
## 2 posterior 6 14 0.3000000 0.2777778 0.01000000 0.1000000
plot_beta_binomial(alpha=3,beta=10,y=3,n=7)
Also has a similar shape for the posterior of the simulated model and the exact model. The mean for the simulated model was 0.3028835 and the exact mean was 0.30000. Again, a very similar value.
Taylor
summarize_beta_binomial(alpha=2,beta=0.1,y=3,n=7)
## model alpha beta mean mode var sd
## 1 prior 2 0.1 0.9523810 1.0000000 0.01462951 0.1209525
## 2 posterior 5 4.1 0.5494505 0.5633803 0.02451036 0.1565579
plot_beta_binomial(alpha=2,beta=0.1,y=3,n=7)
The simulated histogram for Taylor’s model is the most interesting to me with peaks in more spread out places. Overall though, the shape of the results are similar to the posterior I found here as well. The simulated mean is 0.535582 and the exact mean is 0.5494505. Once again, a pretty similar value.
For each situation, identify if the prior or the data has more of an influence or if there is an equal compromise.
Beta(1,4), Y = 8, n = 10 – the data has more influence on the posterior
Beta(20,3), Y = 0, n = 1 – the prior has more influence on the posterior
Beta(4,2), Y = 1, n = 3 – the prior and data have equal influence on the posterior
Beta(3,10), Y = 10, n = 13 – the prior and data have equal influence on the posterior
Beta(20,2), Y = 10, n = 200 – the data has more influence on the posterior
Plot each of the above scenarios and compare the scaled likelihoods, priors, and posterior pdfs:
plot_beta_binomial(alpha = 1,beta=3,y=8,n=10) + labs(title="Graph A")
plot_beta_binomial(alpha=20,beta=3,y=0,n=1) + labs(title="Graph B")
plot_beta_binomial(alpha=4,beta=2,y=1,n=3) + labs(title="Graph C")
plot_beta_binomial(alpha=3,beta=10,y=10,n=13) + labs(title="Graph D")
plot_beta_binomial(alpha =20,beta=2,y=10,n=200) + labs(title="Graph E")
These visualizations show that the data has a stronger influence for graphs a and e. The priors have a stronger influence for graphs b. And the prior and data have similar influence for graphs c and d.
Let \(\pi\) denote the proportion of people that prefer dogs to cats. The prior is Beta(7,2).
Reasonable values should center around 77% (7/9).
plot_beta(7,2)
This survey gives a very very high rate of dog preference (95%) that will increase the mean of the prior, giving it higher density, and increase the certainty of the \(\pi\) values being so high above 75%, making the graph narrower around the high values.
plot_beta_binomial(alpha=7,beta=2,y=19,n=20)
This would pretty drastically change the mean \(\pi\) given how counter it is to the prior. The mean would lower and I would guess the certainty would be pretty consistent because its very different from the prior but the data evidence if very strong.
plot_beta_binomial(alpha=7,beta=2,y=1,n=20)
This data is pretty inconclusive. It doesn’t lean strongly towards preference for dogs or cats, but it does differ from the prior. I would guess that this lowers the mean and lowers the certainty, thus expanding the range of values.
plot_beta_binomial(alpha=7,beta=2,y=10,n=20)
This actually keeps the range of values pretty consistent it looks like. Mathematically speaking, a 50% success rate adds the same value to the alpha and the beta in the prior, so that the posterior model is given by Beta(17,12).
In each situation, identify n and y and then sketch the prior, scaled likelihood, and posterior pdf.
I’m going to use the formula that alpha_posterior = alpha_prior + y and beta_posterior = beta_prior + n - y. To solve for y and n I can transform those to be:
\(y = \alpha_{posterior} - \alpha_{prior}\)
\(n = \beta_{posterior} - \beta_{prior} + y\)
y = 8.5 - 0.5 = 8 n = 2.5 - 0.5 + y = 10
plot_beta_binomial(alpha=0.5,beta=0.5,y=8,n=10)
y = 3.5 - 0.5 = 3 n = 10.5 - 0.5 + y = 13
plot_beta_binomial(alpha=0.5,beta=0.5,y=3,n=13)
y = 12 - 10 = 2 n = 15 - 1 + y = 16
plot_beta_binomial(alpha=10,beta=1,y=2,n=16)
y = 15 - 8 = 7 n = 6 - 3 + y = 10
plot_beta_binomial(alpha=8,beta=3,y=7,n=10)
y = 5 - 2 = 3 n = 5 - 2 + y = 6
plot_beta_binomial(alpha=2,beta=2,y=3,n=6)
y = 30 - 1 = 29 n = 3 - 1 + y = 31
plot_beta_binomial(alpha = 1,beta=1,y=29,n=31)
In each situation we have Beta(1,1) but different data. Identify the posterior model and plot them all
Using the same formula above, I can take the prior alpha and beta and use the data to get the posterior alpha and beta. In this case, 1 + Y = alpha_posterior and 1 + (n-Y) = beta_posterior.
Beta Posterior (11,4)
plot_beta_binomial(alpha=1,beta=1,y=10,n=13)
Beta Posterior (1,2)
plot_beta_binomial(alpha=1,beta=1,y=0,n=1)
Beta Posterior (101,31)
plot_beta_binomial(alpha=1,beta=1,y=100,n=130)
Beta Posterior (21, 101)
plot_beta_binomial(alpha=1,beta=1,y=20,n=120)
Beta Posterior (235,235)
plot_beta_binomial(alpha=1,beta=1,y=234,n=468)
Repeat exercise 4.11 with a prior of Beta(10,2). Again, using the formula. In this case, 10 + Y = alpha_posterior and 2 + (n-Y) = beta_posterior.
Y = 10, n = 13 – Beta (20,5)
Y = 0, n = 1 – Beta (10,3)
Y = 100, n = 130 – Beta (110,32)
Y = 20, n = 120 – Beta (30, 102)
Y = 234, n = 468 – Beta (244, 236)
plot_beta_binomial(alpha=10,beta=2,y=10,n=13) + labs(title="a")
plot_beta_binomial(alpha=10,beta=2,y=0,n=1) + labs(title="b")
plot_beta_binomial(alpha=10,beta=2,y=100,n=130) + labs(title="c")
plot_beta_binomial(alpha=10,beta=2,y=20,n=120) + labs(title="d")
plot_beta_binomial(alpha=10,beta=2,y=234,n=468) + labs(title="e")
We can screw up a Bayesian. A politician specifies the prior understanding of their approval rating by \(\pi \text{ ~ } Unif(0.5,1) \text{ with pdf } f(\pi) = 2 \text{ when } 0.5 =< \pi < 1 \text{, and } f(\pi) = 0 \text{ when } 0 < \pi < 0.5\)
Politician Prior
The politician believes that their approval rating has an equal probability of being anywhere between 50% and 100%.
Politician Posterior and Formula
Their posterior combines their previous understanding that the probability is equal for values between 0.5 and 1, and is not existent for under 0.5 with the data that they got a 0 approval rating. This makes it so their prior model is incompatible with the data they were given.
\(Mode(\pi) = \frac{\alpha - 1}{\alpha + \beta - 2}, \text{when } \alpha,\beta > 1\)
Mode Formula
The posterior mode converges to 0 as n increases to infinity. Because we have n in the denominator.
Let \(\pi\) be the probability of success for some event of interest You place Beta(2,3) prior on \(\pi\). Sequentially update your posterior for \(\pi\) with each new observation. Once again, I will use the above formula for how to use the data to update alpha posterior and beta posterior (adding successes to alpha and failures to beta).
Beta(3,3)
Beta(4,3)
Beta(4,4)
Beta(5,4)
All of this is equivalent to Y = 3, n = 4. Which would produce a Beta Posterior (5,4)
Let Beta(2,3) and update the posterior model after each five observations. I will do this using the same technique as above.
Beta(5,5)
Beta(6,9)
Beta(7,13)
Beta(9,18)
A show company develops a new ad for sneakers. Three employees share the same prior Beta(4,3). Each employee runs a different study with different data. The first employee has Y = 0, n = 1. The second employee has Y = 3, n = 10. The third employee has Y = 20, n = 100.
plot_beta(4,3)
The employees think that the probability that a user will click on the ad is likely around 60% and there is not a lot of certainty around that value.
I’ll use the alpha and beta conversion formulas for each of the employee’s observation data. Namely that alpha posterior = alpha prior + y; beta posterior = beta prior + n - y
Employee 1: Beta(4,4)
Employee 2: Beta(7,10)
Employee 3: Beta(24,83)
plot_beta_binomial(alpha=4,beta=3,y=0,n=1) + labs(title="Employee 1")
plot_beta_binomial(alpha=4,beta=3,y=3,n=10) + labs(title="Employee 2")
plot_beta_binomial(alpha=4,beta=3,y=20,n=100) + labs(title="Employee 3")
Employee 1 gathered basically no data, so their posterior looks pretty similar to their prior, but does have a lower mode for \(\pi\).
Employee 2 has a bit more info and the rate is lower than predicted. So there is a bit higher certainty that the posterior is closer to the observed data than the prior model.
Employee 3 ran a large survey with a lot more data and shows a lot of certainty centering around a much lower value for \(\pi\) (closer to 25%)
The above show company brings in a fourth employee. They start with the same Beta(4,3) prior but don’t collect their own data. They get each employees data sequentially.
Day 1 data: y = 0, n = 1
Beta(4,4)
Day 2 data: y = 3, n = 10
Beta(7,11)
Day 3 data: y = 20, n = 100
Beta(27,91)
plot_beta(4,3) + labs(title="Prior")
plot_beta(4,4) + labs(title="Day 1")
plot_beta(7,11) + labs(title ="Day 2")
plot_beta(27,91) + labs(title="Day 3")
Combining all of the employee data gives Y = 0 + 3 + 20 = 23 and n = 1 + 10 + 100 = 111. This will update the model:
alpha posterior = alpha prior + y = 4 + 23 = 27
beta posterior = beta prior + n - y = 3 + 111 - 23 = 91
Beta(27,91)
Analyze \(\pi\), the proportion of films that pass the Bechdel test. Specify the posterior and calculate the posterior mean and mode.
#Import data
data(bechdel,package="bayesrules")
#Analyze the movies from 1980
bechdel_80 <- bechdel |>
filter(year=="1980")
#Get a table that calculates the pass and fails from 1980 films
bechdel_80 |>
tabyl(binary) |>
adorn_totals("row")
## binary n percent
## FAIL 10 0.7142857
## PASS 4 0.2857143
## Total 14 1.0000000
There was a pass rate of 4 (y = 4) of 14 movies (n = 14).
This would provide a posterior Beta(5,11).
Mean = alpha / alpha + beta Mode = alpha - 1 / alpha + beta - 2
Mean = 5 / 5 + 11 = 5 / 16 = 0.3125
Mode = 5 - 1 / 5 + 11 - 2 = 4 / 14 = 0.2857143
#1990 data
bechdel_90 <- bechdel |>
filter(year=="1990")
bechdel_90 |>
tabyl(binary) |>
adorn_totals("row")
## binary n percent
## FAIL 9 0.6
## PASS 6 0.4
## Total 15 1.0
This provides a y of 6 and n = 15. To update the previous Beta(5,11) with this data gives us a new posterior of Beta(11,20).
Mean = 11 / 11 + 20 = 0.3548387
Mode = 11 - 1 / 11 + 20 - 2 = 0.3448276
#data from year 2000
bechdel_00 <- bechdel |>
filter(year=="2000")
bechdel_00 |>
tabyl(binary) |>
adorn_totals("row")
## binary n percent
## FAIL 34 0.5396825
## PASS 29 0.4603175
## Total 63 1.0000000
This gives y = 29, n = 63. Updating the Beta(11,20) gives a new posterior of Beta(40,54).
Mean = 40 / 40 + 54 = 0.4255319
Mode = 40 - 1 / 40 + 54 - 2 = 0.423913
#looking at data from the 80s,90s,00s all at once
bechdel_yrs <- bechdel |>
filter(year=="1980" | year=="1990" | year=="2000")
bechdel_yrs |>
tabyl(binary) |>
adorn_totals("row")
## binary n percent
## FAIL 53 0.576087
## PASS 39 0.423913
## Total 92 1.000000
This gives a total of y = 39 and n = 92 for all of those years. This would gives a posterior of Beta(40,54). This matches the posterior we got from the sequential analysis.
The mean and mode can be calculated the same way as in the sequential case.
Mean = 40 / 40 + 54 = 0.4255319
Mode = 40 - 1 / 40 + 54 - 2 = 0.423913
We can use Bayes to sequentially update our understanding of a parameter of interest. How is this different from what a frequentist approach would be? How is it similar?
Frequentist methods would just take the new data presented as a representation of the proportion of movies that pass the bechdel test without taking the prior into account. So each day, the most recent understanding would update with the data regardless of the priors we get, causing a lot more variance in how we see the stat day to day. It is similar in that if a frequentist adds yesterday’s data to make it cummulative, it should give a similar answer between the sequential analysis and the total. I.e., if day one the data they use is 4/14 then day 2 they use 4 + 6 / 14 + 15 = 10 / 29 etc. then the last day will produce the same value as if they took all pieces of data at once.